Sarah Strochak, Kyle Ueyama, Aaron R. Williams
What is 2 + 2?
What is 2 + 2?
## [1] 4
What is the median price of diamonds with carat > 1 and a Good cut?
What is the median price of diamonds with carat > 1 and a Good cut?
## # A tibble: 1 x 1
## `median(price)`
## <int>
## 1 6412
How could increasing the retirement age affect the poverty rates of Hispanic women ages 62 and older?
How could increasing the retirement age affect the poverty rates of Hispanic women ages 62 and older?
Deliberate steps should be taken to minimize the chance of making an error and maximize the chance of catching errors when errors inevitably occur.
Computational reproducibility should be embraced to improve accuracy, promote transparency, and prove the quality of analytic work.
Code should be written so humans can easily understand what’s happening—even if it occasionally sacrifices machine performance.
Analyses should be designed so strangers can understand each and every step without additional instruction or inquiry from the original analyst.
Research and data are non-rival and non-exclusive. They are public goods that should be widely and easily shared. Decisions about tools, methods, data, and language during the research process should be made in ways that promote the ability of anyone and everyone to access an analysis.
Analysts should seek to make all parts of the research process more efficient with clear communication, by adopting best practices, and by managing computation.
.R and .Rmd“Good coding style is like correct punctuation: you can manage without it, butitsuremakesthingseasiertoread.” ~ Hadley Wickham
Collections of R, C, C++, and FORTRAN code that expand the functionality of R.
The Comprehensive R Archive Network was introduced in 1997.
Repository of popular R packages with basic standards and quality control.
Comprehensive set of tools for data science
Core: ggplot2, dplyr, tidyr, readr, purrr, tibble, stringr, forcats
Free text by Hadley Wickham and Garrett Grolemund
Scalars (do not exist in R)
Vectors
## [1] 1 2 3 4 5
Matrices
## [,1] [,2] [,3]
## [1,] 1 3 5
## [2,] 2 4 6
Data frames, multidimensional arrays
## # A tibble: 4 x 4
## name awake brainwt bodywt
## <chr> <dbl> <dbl> <dbl>
## 1 Cheetah 11.9 NA 50
## 2 Owl monkey 7 0.0155 0.48
## 3 Mountain beaver 9.6 NA 1.35
## 4 Greater short-tailed shrew 9.1 0.00029 0.019
Character
## [1] "a" "b" "c" "d" "e"
Numeric
## [1] 1 2 3 4 5
Logical
## [1] TRUE TRUE FALSE TRUE FALSE
Factor
## [1] good ok bad ok ok
## Levels: good ok bad
NA is R’s encoding for missing values## [1] NA
R can hold many different objects at the same time. Storing the consequence of code requires assignment. <-
## [1] 4
## [1] 4
Arguments by position
## [1] 2.5
Arguments by name
## [1] 2.5
Function documentation
?mean
Rule of three: never program something three or more times
test_oddness <- function(x) {
ifelse(test = x %% 2 == 0, yes = "even!", no = "odd!")
}
test_oddness(1:10)## [1] "odd!" "even!" "odd!" "even!" "odd!" "even!" "odd!" "even!"
## [9] "odd!" "even!"
What will it take to convince you that your code is correct?
data/, scripts/, and outputs// regardless of operating system.setwd() to shortcut much of absolute file paths.Rproj are a superior solution only available in RsessionInfo()install.packages("tidyverse") to the consolelibrary(tidyverse)Photo by StataCorp LP, CC BY-SA 4.0, Unaltered
Source is unknown
Comments